How to extract bold or italic characters from a docx?

Asked

Viewed 22 times

1

Have a docx of articles published in marketing magazines that contain scales and would like to extract them.

introducir la descripción de la imagen aquí

For example, with the following scale, I would like to get the title: Ten-Item Presonality Inventories and Five-Item, the questions "I see my self as:" and the answers.

{
   "title":"Ten-Item and Five-Item Presonality Inventories",
   "scales":{
      "I see myself as":{
         "answer0":"1. Extraverted, enthusiastic",
         "answer1":"2. Critical, quarrelsome",
         "answer2":"3. Dependable, self-disciplined",
         ...
      },
      "I see my self as":{
         "answer0":"Extraverted, enthusiastic (that is, sociable ... ",
         "answer1":"2. Agreeable, kind ...",
         ...
      }
   }
}

or something similar, if duplicated scale names are a problem.

Until today, I can extract the content separately, with the following code:

# !pip install python-docx 
from docx import *

document = Document('/content/drive/My Drive/Books/handbook-of-marketing-scale-2011.docx')

dict_open = False
qas = {}
qa = {}
for para in document.paragraphs:
    for run in para.runs:
        try:
            if run.bold and dict_open == False:
                # open a dict of qa until next bold
                dict_open = True
                question = run.text
            elif run.text[0].isdigit() and dict_open == True:
                number = run.text.split(".")[0]
                answer = run.text.split(".")[-1]
                qa[number] = answer
            elif run.bold and dict_open == True and qa:
                # close dict
                qas[question] = qa
                qa = {}
                dict_open = False
            else:
                pass
        except IndexError:
              pass    
      
            

But I get:

{' ': {'603': '603'},
 '(Articles Containing Inter-/Intrafirm-Related Measures)': {'11, 186–93': '',
  '121–123': '121–123',
  '127–128': '127–128',
  '140–142': '140–142',
  '146–147': '146–147',
  '18, 21': '18, 21',
  '183–187': '183–187',
  '222–224': '222–224',
  '23, 387–93': '',
  '230–231': '230–231',
  '240–241, 243–244': '240–241, 243–244',
  '243–244': '243–244',
  '256–257': '256–257',
  '270–271': '270–271',
  '272–275, 285': '272–275, 285',
  '278–280': '278–280',
  '281–282': '281–282',
  '292–294': '292–294',
  '302–305': '302–305',
  '321–323': '321–323',
  '336–338': '336–338',
  '339–340': '339–340',
  '347–349': '347–349',
  '356–357': '356–357',
  '374, 376': '374, 376',
  '399–401': '399–401',
  '416–418': '416–418',
  '419–421': '419–421',
  '423–426': '423–426',
  '429–431': '429–431',
  '430, 431': '430, 431',
  '446–447': '446–447',
  '448–450': '448–450',
  '456–458': '456–458',
  '487, 490': '487, 490',
  '489–490, 492': '489–490, 492',
  '49–51': '49–51',
  '501–503': '501–503',
  '512–514': '512–514',
  '515–517': '515–517',
  '526–527': '526–527',
  '531–534': '531–534',
  '545–548': '545–548',
  '549–551': '549–551',
  '561–562, 564': '561–562, 564',
  '576–577': '576–577',
  '580–583': '580–583',
  '584, 586–587': '584, 586–587',
  '589': '589',
  '590': '590',
  '592': '592',
  '594': '594',
  '596': '596',
  '598': '598',
  '600': '600'},
 'A Diagnostic Tool/Clinical Screener for Classifying Compulsive Consumers': {'1 really believe that having more money would solve most of my problems': ''},
 'Adaptive Selling: ADAPTS': {'1990 by the American Marketing Association': '',
  '544\tHANDBOOK OF MARKETING SCALES': '544\tHANDBOOK OF MARKETING SCALES'},
 'Agents’ Socially Desirable Responding: ASDR Scale': {'477': '477',
  '478\tHANDBOOK OF MARKETING SCALES': '478\tHANDBOOK OF MARKETING SCALES'},
 'Alliance Orientation': {'554\tHANDBOOK OF MARKETING SCALES': '554\tHANDBOOK OF MARKETING SCALES'},
 'Analytic/Holistic Thinking Scale: AHS': {'288\tHANDBOOK OF MARKETING SCALES': '288\tHANDBOOK OF MARKETING SCALES',
  '51, 407–15': ''},
 'Appendix to General Values': {'162': '162'},
 'Appendix to SERVQUAL: Review and Sources of SERVQUAL Use': {'21 (1), 1–12': '',
  '30, 7–27': '',
  '428\tHANDBOOK OF MARKETING SCALES': '428\tHANDBOOK OF MARKETING SCALES',
  '64, 12–40': ''},
 'Attitudes About the Performance of Business Firms, Satisfaction and': {'1': '1'},
 'Behavioral Identification Form: BIF': {'292\tHANDBOOK OF MARKETING SCALES': '292\tHANDBOOK OF MARKETING SCALES'},
 'Behavioral Inhibition and Behavioral Activation Systems: BIS/BAS Scales': {'222': '222'},
 'Brand Experience Scale': {'144 unspecified participants; and Study 6,': '144 unspecified participants; and Study 6,',
  '150 mall/street shoppers; Study 5,': '150 mall/street shoppers; Study 5,',
  '209 university students': ''},
 'Brief Mood Introspection Scale: BMIS': {'302\tHANDBOOK OF MARKETING SCALES': '302\tHANDBOOK OF MARKETING SCALES'},
 'Centrality of Visual Product Aesthetics': {'356\tHANDBOOK OF MARKETING SCALES': '356\tHANDBOOK OF MARKETING SCALES'},
 'Chapter 2': {'148\tHANDBOOK OF MARKETING SCALES': '148\tHANDBOOK OF MARKETING SCALES'},
 'Chapter 3': {'226': '226'},
 'Chapter 4': {'57 (4), 660–71': ''},
 'Chapter 5': {'30, 234–45': '',
  '386\tHANDBOOK OF MARKETING SCALES': '386\tHANDBOOK OF MARKETING SCALES'},
 'Chapter 6': {'7': '7'},
 'Chapter 7': {'5': '',
  '588\tHANDBOOK OF MARKETING SCALES': '588\tHANDBOOK OF MARKETING SCALES'},
 'Cognitive and Sensory Innovativeness': {'110\tHANDBOOK OF MARKETING SCALES': '110\tHANDBOOK OF MARKETING SCALES'},
 'Comparing Four Modified Involvement Scales': {'12, 341–52': '',
  '12, 663–82': '',
  '1995 by John Wiley & Sons, Inc': '',
  '22, 41–53': '',
  '24–38': ''},
 'Construct': {'1983 by the American Marketing Association': '',
  '560\tHANDBOOK OF MARKETING SCALES': '560\tHANDBOOK OF MARKETING SCALES'},
 'Construct:': {'1984 by the American Marketing Association': '',
  '226–33': '',
  '586\tHANDBOOK OF MARKETING SCALES': '586\tHANDBOOK OF MARKETING SCALES'},
 'Construct: ': {'6': '6'},
 'Consumer Attitudes Toward Business Practices and Marketing': {'387': '387',
  '388\tHANDBOOK OF MARKETING SCALES': '388\tHANDBOOK OF MARKETING SCALES'},
 'Consumer Attitudes Toward Marketing and Consumerism': {'392\tHANDBOOK OF MARKETING SCALES': '392\tHANDBOOK OF MARKETING SCALES'},
 'Consumer Attitudes to Debt': {'190': '190', '3 ': '3 '},
 'Consumer Involvement Profiles: CIP': {'244\tHANDBOOK OF MARKETING SCALES': '244\tHANDBOOK OF MARKETING SCALES'},
 'Consumer Self-Confidence: CSC': {'22\tHANDBOOK OF MARKETING SCALES': '22\tHANDBOOK OF MARKETING SCALES',
  '5-point scale labeled 1 ': '5-point scale labeled 1 '},
 'Consumer Spending Self-Control: CSSC': {'82\tHANDBOOK OF MARKETING SCALES': '82\tHANDBOOK OF MARKETING SCALES'},
 'Consumer Susceptibility to Reference Group Influence': {'1)': '',
  '3,': '3,',
  '4,': '4,'},
 'Consumers’ Emotional Attachments to Brands': {'1': '1',
  '358\tHANDBOOK OF MARKETING SCALES': '358\tHANDBOOK OF MARKETING SCALES'},
 'Control: Supervisory Control': {'526\tHANDBOOK OF MARKETING SCALES': '526\tHANDBOOK OF MARKETING SCALES'},
 'Convergent, Discriminant, and Nomological Validity': {'10': '10'},
 'Country Image Scale': {'1993 by Elsevier Science': '',
  '84\tHANDBOOK OF MARKETING SCALES': '84\tHANDBOOK OF MARKETING SCALES'},
 'Culture: Organizational Culture': {'460\tHANDBOOK OF MARKETING SCALES': '460\tHANDBOOK OF MARKETING SCALES'},
 'Customer Orientation of Salespeople: SOCO': {'1—True for NONE of your customers—NEVER': '1—True for NONE of your customers—NEVER',
  '2—True for ALMOST NONE': '2—True for ALMOST NONE',
  '3—True for A FEW': '3—True for A FEW',
  '4—True for SOMEWHAT LESS THAN HALF': '4—True for SOMEWHAT LESS THAN HALF',
  '5—True for ABOUT HALF': '5—True for ABOUT HALF',
  '6—True for SOMEWHAT MORE THAN HALF': '6—True for SOMEWHAT MORE THAN HALF',
  '7—True for a LARGE MAJORITY': '7—True for a LARGE MAJORITY',
  '8—True for ALMOST ALL': '8—True for ALMOST ALL',
  '9—True for ALL of your customers—ALWAYS': '9—True for ALL of your customers—ALWAYS'},
 'Customer-Based Reputation of a Service Firm: CBR Scale': {'103 (3),411–23': '',
  '398\tHANDBOOK OF MARKETING SCALES': '398\tHANDBOOK OF MARKETING SCALES'},
 'Economic and Social Satisfaction': {'76 (Spring), 11–32': ''},
 'Emotions: Dimensions of Emotions: PAD': {'4 to –4 basis': ''},
 'Ethics: Corporate Ethics Scale: CEP': {'1993 by Sage Publications': '',
  '454\tHANDBOOK OF MARKETING SCALES': '454\tHANDBOOK OF MARKETING SCALES'},
 'Expertise, Trustworthiness, and Attractiveness of Celebrity Endorsers': {'1993 by the American Marketing Association': ''},
 'Feelings Toward Ads': {'320\tHANDBOOK OF MARKETING SCALES': '320\tHANDBOOK OF MARKETING SCALES'},
 'Feelings TowardAds': {'317': '317',
  '318\tHANDBOOK OF MARKETING SCALES': '318\tHANDBOOK OF MARKETING SCALES'},
 'Gender Dimensions of Brand Personality': {'348\tHANDBOOK OF MARKETING SCALES': '348\tHANDBOOK OF MARKETING SCALES'},
 'General Self-Control': {'80\tHANDBOOK OF MARKETING SCALES': '80\tHANDBOOK OF MARKETING SCALES'},
 'General Values': {'151': '151', '152': '152'},
 'Health Consciousness Scale: HCS': {'176': '176'},
 'Hedonic Shopping Motivations': {'360\tHANDBOOK OF MARKETING SCALES': '360\tHANDBOOK OF MARKETING SCALES'},
 'Horizontal and Vertical Individualism and Collectivism': {'56\tHANDBOOK OF MARKETING SCALES': '56\tHANDBOOK OF MARKETING SCALES'},
 'In general,': {'4': '4'},
 'Influence Strategies in Marketing Channels': {'558\tHANDBOOK OF MARKETING SCALES': '558\tHANDBOOK OF MARKETING SCALES'},
 'Informational and Transformational Ad Content': {'322\tHANDBOOK OF MARKETING SCALES': '322\tHANDBOOK OF MARKETING SCALES'},
 'Involvement General to Several Product Classes': {'237': '237',
  '238\tHANDBOOK OF MARKETING SCALES': '238\tHANDBOOK OF MARKETING SCALES'},
 'Jain and Srinivasan (1990) CIP Scale': {'246\tHANDBOOK OF MARKETING SCALES': '246\tHANDBOOK OF MARKETING SCALES'},
 'Job Diagnostic Survey: JDS': {'498\tHANDBOOK OF MARKETING SCALES': '498\tHANDBOOK OF MARKETING SCALES'},
 'Long-Term Orientation: LTO': {'30\tHANDBOOK OF MARKETING SCALES': '30\tHANDBOOK OF MARKETING SCALES'},
 'Materialistic Attitudes: MMA': {'204': '204'},
 'Measure of CRM Process and Its Impact on Performance': {'458\tHANDBOOK OF MARKETING SCALES': '458\tHANDBOOK OF MARKETING SCALES'},
 'Mood Short Form: MSF': {'54, 1063–70': ''},
 'Moral Identity': {'166': '166'},
 'Need to Evaluate: NES': {'3 ': '3 ', '36': ''},
 'New Measure of Brand Personality: NMBP': {'350\tHANDBOOK OF MARKETING SCALES': '350\tHANDBOOK OF MARKETING SCALES'},
 'Opinion Leadership and Information Seeking': {'104\tHANDBOOK OF MARKETING SCALES': '104\tHANDBOOK OF MARKETING SCALES'},
 'Organizational Citizenship Behaviors: OCBs': {'1993 by the American Marketing Association': '',
  '57, 70–80': ''},
 'Organizational Commitment': {'540\tHANDBOOK OF MARKETING SCALES': '540\tHANDBOOK OF MARKETING SCALES'},
 'Organizational Commitment: OCQ': {'536\tHANDBOOK OF MARKETING SCALES': '536\tHANDBOOK OF MARKETING SCALES'},
 'Other evidence': {'0': '16) as well as peer communication (beta '},
 'Other evidence:': {'27, 333–44': '', '64, 295–314': ''},
 'Other source': {'434\tHANDBOOK OF MARKETING SCALES': '434\tHANDBOOK OF MARKETING SCALES'},
 'Other source:': {'534\tHANDBOOK OF MARKETING SCALES': '534\tHANDBOOK OF MARKETING SCALES'},
 'Other sources': {'21, 522–33': ''},
 'Other sources:': {'12, 177–87': ''},
 'Other sources:   ': {'16, 321–38': '',
  '530\tHANDBOOK OF MARKETING SCALES': '530\tHANDBOOK OF MARKETING SCALES'},
 'Other sources:    ': {'51, 125–39': ''},
 'PII for Advertising: PIIA': {'258\tHANDBOOK OF MARKETING SCALES': '258\tHANDBOOK OF MARKETING SCALES'},
 'Perceived Leader Behavior Scales': {'1': ' Item scores can be summed within each factor to form indices for each of'},
 'Polychronic Attitude Index: PAI': {'230': '230'},
 'Positive and Negative Affect Scales (PANAS)': {'5': '5'},
 'Possessions: Attachment to Possessions': {'212': '212',
  '6 requires reverse scoring': ''},
 'Power and Influence in Group Settings': {'2': '2',
  '566\tHANDBOOK OF MARKETING SCALES': '566\tHANDBOOK OF MARKETING SCALES'},
 'Power: Distributor, Manufacturer, and Customer Market Power': {'1 ': '1 ',
  '2 ': '2 ',
  '20% of my customers account for 80% of my total product sales': '',
  '3 ': '3 ',
  '4 ': '4 ',
  '5 ': '5 ',
  '564\tHANDBOOK OF MARKETING SCALES': '564\tHANDBOOK OF MARKETING SCALES'},
 'Price Perception Scales': {'380\tHANDBOOK OF MARKETING SCALES': '380\tHANDBOOK OF MARKETING SCALES'},
 'Product Retention Tendency: PRT': {'214': '214'},
 'Public Opinion TowardAdvertising': {'334\tHANDBOOK OF MARKETING SCALES': '334\tHANDBOOK OF MARKETING SCALES'},
 'Purchase Decision Involvement: PDI': {'1': ' In selecting from many types and brands of this product available in the market, would you say that:',
  '1 2 3 4 5 6 7': '1 2 3 4 5 6 7',
  '1 2 3 4 5 6 7 I would care a great deal as to which one I buy': '',
  '3': ' How important would it be to you to make a right choice of this product?'},
 'Purchasing Involvement: PI': {'06, ns, ': '06, ns, '},
 'RPII and OPII': {'58) for eyeglasses (i': '91 ('},
 'Reference': {'112\tHANDBOOK OF MARKETING SCALES': '112\tHANDBOOK OF MARKETING SCALES'},
 'Reference:': {'1 ': '1 ',
  '1 feel alienated from top management': '',
  '510\tHANDBOOK OF MARKETING SCALES': '510\tHANDBOOK OF MARKETING SCALES'},
 'References': {'13, 405–9': '', '154': '154', '32, 547–49': ''},
 'References:': {'50 (1), 1–28': '',
  '514\tHANDBOOK OF MARKETING SCALES': '514\tHANDBOOK OF MARKETING SCALES'},
 'Regulatory Focus Composite Scale: RF-COMP': {'232': '232'},
 'Regulatory Focus Questionnaire: RFQ': {'236': '236'},
 'Reliability': {'8': '8'},
 'Role Ambiguity: Multifaceted, Multidimensional Role Ambiguity: MULTIRAM': {'1991 by American MarketingAssociationScale itemstaken fromAppendix(pp': '',
  '1991 by the Marketing Science Institute': '',
  '28, 328–38': ''},
 'Role Overload of the Wife': {'300\tHANDBOOK OF MARKETING SCALES': '300\tHANDBOOK OF MARKETING SCALES'},
 'Sales Performance Scale': {'520\tHANDBOOK OF MARKETING SCALES': '520\tHANDBOOK OF MARKETING SCALES'},
 'Sample:': {'28) was reportedfor the scale(Houseand Rizzo 1972,p': ''},
 'Samples': {'1987 by the American Marketing Association': ''},
 'Samples:': {'1974 by Southern Illinois University Press': ''},
 'Scales Related to Information Processing:': {'4])': ''},
 'Scales Related to Interpersonal Orientation, Needs/Preferences, and Self-Concept': {'15': '15',
  '16\tHANDBOOK OF MARKETING SCALES': '16\tHANDBOOK OF MARKETING SCALES'},
 'Scales Related to Post-Purchase Behavior: Consumer Discontent': {'430\tHANDBOOK OF MARKETING SCALES': '430\tHANDBOOK OF MARKETING SCALES'},
 'Scores': {'13, 121–37': '', '1996 by Elsevier Science': ''},
 'Scores:': {'1970 by ': '1970 by '},
 'Self-Monitoring Scale': {'146\tHANDBOOK OF MARKETING SCALES': '146\tHANDBOOK OF MARKETING SCALES'},
 'Service Convenience: SERVCON': {'420\tHANDBOOK OF MARKETING SCALES': '420\tHANDBOOK OF MARKETING SCALES'},
 'Service Quality of Retail Stores': {'410\tHANDBOOK OF MARKETING SCALES': '410\tHANDBOOK OF MARKETING SCALES'},
 'Service Quality: Physical Distribution Service Quality': {'424\tHANDBOOK OF MARKETING SCALES': '424\tHANDBOOK OF MARKETING SCALES'},
 'Situation-Specific Thinking Styles: STSS': {'5 ': '5 '},
 'Socially Responsible Consumption Behavior: SRCB': {'184': '184'},
 'Source': {'1992 by University of Chicago Press': ''},
 'Source:': {'1992 by the American Marketing Association': ''},
 'Sources': {'1981 by University of Chicago Press': '',
  '1986 by the Association for Consumer Research': ''},
 'Sources:': {'7-point scales ranging from 1 ': '7-point scales ranging from 1 '},
 'Style of Processing Scale: SOP': {'1982 by University of Chicago Press': '',
  '8, 407–17': ''},
 'Summary': {'12': '12',
  '16 (February), 64–73': '',
  '24 (4), 366–74': '',
  '28 (4), 674–89': '',
  '34, 100–17': '',
  '7 (3), 309–19': '',
  '70 (1), 172–94': '',
  '78, 98–104': '',
  '9 (June), 139–64': ''},
 'Table 3.3': {'2 (Summer), 5–18': ''},
 'Table 7.1': {'1989 by the American Marketing Association': ''},
 'The Technology Readiness Index (or Techqual™)': {'312)—specifically, OPT1-OPT10 (10 items),INN1-INN7 (7 items),DIS1-DIS10 (10 items), and INS1-INS9 (9 items)': ' Questions were answered on a 5-pointscale, where 1 '},
 'The eTail Quality Scale: eTailQ': {'416\tHANDBOOK OF MARKETING SCALES': '416\tHANDBOOK OF MARKETING SCALES'},
 'Uniqueness: Desire for Unique Consumer Products: DUCP': {'224 sample': ''},
 'Validity': {'1983 by Praeger': ''},
 'Validity:': {'35 (September), 382–97': ''},
 'Voluntary Simplicity Scale: VSS': {'188': '188'},
 'how often': {'574\tHANDBOOK OF MARKETING SCALES': '574\tHANDBOOK OF MARKETING SCALES'},
 'research tool': {'100,000 pages of SAGE book and': '100,000 pages of SAGE book and'},
 'tension': {'306\tHANDBOOK OF MARKETING SCALES': '306\tHANDBOOK OF MARKETING SCALES'},
 'very closely related': {'256\tHANDBOOK OF MARKETING SCALES': '256\tHANDBOOK OF MARKETING SCALES'}}
            
No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.